Welcome to our introductory lab, where we'll explore the basics of creating various Data visualizations and converting them into widely accessible HTML format for online publishing. This lab is designed to provide a first look at the end-to-end process of writing and sharing analyses using various Data visualization packages
mermaid
graph LR
Package[Data Visualization]-->matplotlib[Matplotlib - 2D plotting library];
Package[Data Visualization]-->seaborn[Seaborn - Statistical data visualization];
Package[Data Visualization]-->plotly[Plotly - Interactive plotting library];
| Package | Description | Installation |
|---|---|---|
| Matplotlib | Matplotlib is a 2D plotting library for Python. It provides a wide variety of static, animated, and interactive plots. It is widely used for creating publication-quality visualizations. | pip install matplotlib |
| Seaborn | Seaborn is built on top of Matplotlib and provides a high-level interface for drawing attractive and informative statistical graphics. It simplifies the process of creating complex visualizations with less code. | pip install seaborn |
| Plotly | Plotly is an interactive plotting library that allows users to create dynamic and interactive visualizations. It supports a wide range of chart types and can be used for creating dashboards and web-based applications. | pip install plotly |
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import plotly.express as px
import plotly
# Fixing random state for reproducibility
np.random.seed(19680801)
n = 100_000
x = np.random.standard_normal(n)
y = 2.0 + 3.0 * x + 4.0 * np.random.standard_normal(n)
xlim = x.min(), x.max()
ylim = y.min(), y.max()
fig, (ax0, ax1) = plt.subplots(ncols=2, sharey=True, figsize=(9, 4))
hb = ax0.hexbin(x, y, gridsize=50, cmap='inferno')
ax0.set(xlim=xlim, ylim=ylim)
ax0.set_title("Hexagon binning")
cb = fig.colorbar(hb, ax=ax0, label='counts')
hb = ax1.hexbin(x, y, gridsize=50, bins='log', cmap='inferno')
ax1.set(xlim=xlim, ylim=ylim)
ax1.set_title("With a log color scale")
cb = fig.colorbar(hb, ax=ax1, label='log10(N)')
plt.show()
The code utilizes Matplotlib to create a visualization of the electrical potential and gradient of an electrical dipole.
from matplotlib.tri import (CubicTriInterpolator, Triangulation,
UniformTriRefiner)
# ----------------------------------------------------------------------------
# Electrical potential of a dipole
# ----------------------------------------------------------------------------
def dipole_potential(x, y):
"""The electric dipole potential V, at position *x*, *y*."""
r_sq = x**2 + y**2
theta = np.arctan2(y, x)
z = np.cos(theta)/r_sq
return (np.max(z) - z) / (np.max(z) - np.min(z))
# ----------------------------------------------------------------------------
# Creating a Triangulation
# ----------------------------------------------------------------------------
# First create the x and y coordinates of the points.
n_angles = 30
n_radii = 10
min_radius = 0.2
radii = np.linspace(min_radius, 0.95, n_radii)
angles = np.linspace(0, 2 * np.pi, n_angles, endpoint=False)
angles = np.repeat(angles[..., np.newaxis], n_radii, axis=1)
angles[:, 1::2] += np.pi / n_angles
x = (radii*np.cos(angles)).flatten()
y = (radii*np.sin(angles)).flatten()
V = dipole_potential(x, y)
# Create the Triangulation; no triangles specified so Delaunay triangulation
# created.
triang = Triangulation(x, y)
# Mask off unwanted triangles.
triang.set_mask(np.hypot(x[triang.triangles].mean(axis=1),
y[triang.triangles].mean(axis=1))
< min_radius)
# ----------------------------------------------------------------------------
# Refine data - interpolates the electrical potential V
# ----------------------------------------------------------------------------
refiner = UniformTriRefiner(triang)
tri_refi, z_test_refi = refiner.refine_field(V, subdiv=3)
# ----------------------------------------------------------------------------
# Computes the electrical field (Ex, Ey) as gradient of electrical potential
# ----------------------------------------------------------------------------
tci = CubicTriInterpolator(triang, -V)
# Gradient requested here at the mesh nodes but could be anywhere else:
(Ex, Ey) = tci.gradient(triang.x, triang.y)
E_norm = np.sqrt(Ex**2 + Ey**2)
# ----------------------------------------------------------------------------
# Plot the triangulation, the potential iso-contours and the vector field
# ----------------------------------------------------------------------------
fig, ax = plt.subplots()
ax.set_aspect('equal')
# Enforce the margins, and enlarge them to give room for the vectors.
ax.use_sticky_edges = False
ax.margins(0.07)
ax.triplot(triang, color='0.8')
levels = np.arange(0., 1., 0.01)
ax.tricontour(tri_refi, z_test_refi, levels=levels, cmap='hot',
linewidths=[2.0, 1.0, 1.0, 1.0])
# Plots direction of the electrical vector field
ax.quiver(triang.x, triang.y, Ex/E_norm, Ey/E_norm,
units='xy', scale=10., zorder=3, color='blue',
width=0.007, headwidth=3., headlength=4.)
ax.set_title('Gradient plot: an electrical dipole')
plt.show()
The code uses Seaborn to create a kernel density estimate (KDE) plot for the distribution of clarity ratings in the diamonds dataset, conditional on carat.
https://seaborn.pydata.org/examples/multiple_conditional_kde.html
sns.set_theme(style="whitegrid")
# Load the diamonds dataset
diamonds = sns.load_dataset("diamonds")
# Plot the distribution of clarity ratings, conditional on carat
sns.displot(
data=diamonds,
x="carat", hue="cut",
kind="kde", height=6,
multiple="fill", clip=(0, None),
palette="ch:rot=-.25,hue=1,light=.75",
)
<seaborn.axisgrid.FacetGrid at 0x176c3ad5f00>
The provided Python code utilizes Seaborn to create a visualization of multiple short random walks. The visualization includes individual trajectories of each random walk, starting points, and an organized grid layout.
sns.set_theme(style="ticks")
# Create a dataset with many short random walks
rs = np.random.RandomState(4)
pos = rs.randint(-1, 2, (20, 5)).cumsum(axis=1)
pos -= pos[:, 0, np.newaxis]
step = np.tile(range(5), 20)
walk = np.repeat(range(20), 5)
df = pd.DataFrame(np.c_[pos.flat, step, walk],
columns=["position", "step", "walk"])
# Initialize a grid of plots with an Axes for each walk
grid = sns.FacetGrid(df, col="walk", hue="walk", palette="tab20c",
col_wrap=4, height=1.5)
# Draw a horizontal line to show the starting point
grid.refline(y=0, linestyle=":")
# Draw a line plot to show the trajectory of each random walk
grid.map(plt.plot, "step", "position", marker="o")
# Adjust the tick positions and labels
grid.set(xticks=np.arange(5), yticks=[-3, 3],
xlim=(-.5, 4.5), ylim=(-3.5, 3.5))
# Adjust the arrangement of the plots
grid.fig.tight_layout(w_pad=1)
plotly.offline.init_notebook_mode()
df = px.data.wind()
fig = px.bar_polar(df, r="frequency", theta="direction", color="strength", template="plotly_dark",
color_discrete_sequence= px.colors.sequential.Plasma_r)
fig.show()
The provided code utilizes Plotly Express to create a parallel categories plot for the "tips" dataset. The visualization showcases relationships between different categorical variables, with color-coded size information.
plotly.offline.init_notebook_mode()
df = px.data.tips()
fig = px.parallel_categories(df, color="size", color_continuous_scale=px.colors.sequential.Inferno)
fig.show()